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ABSTRACT 



In today’s DoD software environment, where systems of enormous size, complexity and cost 
are the norm, economic conditions are driving DoD system developers to seek ways to increase 
productivity while decreasing product defects. To achieve its goals, DoD has taken the approach of 
integrating reuse into the software development process. In 1992, DISA established its Software 
Reuse Program to serve as a prototype for the DoD-wide reuse initiative. This thesis will look at 
DISA’s effort to support DoD’s reuse vision. Specifically, it will discuss DISA’s software reuse 
library management and will introduce a methodology for the collection and analysis of metrics 
relating to software performance in order to improve library software quality. This thesis concludes 
that metrics can play a key role in any organization’s software quality program. While metrics alone 
are not a solution to the reuse quality problem, they are a tool to be used prudently by the software 
quality manager to manage and improve the quality of organizational software. 
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I. THE SOFTWARE DILEMMA 

A. DOD AND INDUSTRY OVERVIEW 

From most accounts, the software industry has been 
experiencing a "software crisis" since the late 1960s. David 
Fisher (Institute for Defense Analysis Report P-1191, 1976) 
offers the following characteristics of the crisis: 

• Software often fails leading to poor reliability. 

• Software development costs are unpredictable. 

• Software is delivered late, often quasi- functional . 

• Software is seldom portable among domains. 

Because software often exhibits the characteristics described 
above, industry reportedly spends anywhere from 40% to 70% of 
its computing budget on software maintenance (Booch, 1987) . 
As a result, in the field of software engineering, efforts to 
improve software quality and reliability have become the focus 
of both the government and industry alike. 

In today's DoD software environment, where systems of 
enormous size, complexity and cost are the norm, economic 
conditions are driving DoD system developers to seek ways to 
increase productivity while decreasing product defects. DoD 
can no longer afford to invest substantial sums of money 
fielding complex weapons or MIS systems that fail to meet 
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operational expectations due to poor design and reliability. 
To counter the increasing problem of high cost and poor 
performance software systems, DoD and its civilian contractors 
have launched a concerted effort to improve the cpaality of 
software by better managing its design and maintenance 
process . 

B, DOD'S RESPONSE TO THE CRISIS 

On July 15, 1992 the Department of Defense issued a 
document entitled "DoD Software Reuse Initiative Vision and 
Strategy," based on the premise that system architectures, 
designs, test plans and software can be "reused" as part of 
new systems development in order to improve the quality and 
reliability while lowering the cost of software intensive 
systems. In establishing the framework for this reuse effort, 
DoD set four specific goals (DoD Reuse Executive Steering 
Committee, 1992) ; 

• Increase software c[uality and reliability. 

• Improve the management of software technical risk. 

• Shorten system development time. 

• Increase productivity. 

Reuse refers to reusing existing software. To achieve its 
goals, DoD has taken the approach of integrating reuse into 
the software development process. As part of its strategy to 
measure progress towards these goals, DoD has established a 
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pilot metrics program which outlines data to be collected in 
all software reuse activities in order to support its 
management and control objectives. Specifically, DoD proposes 
that metrics be defined, collected, and analyzed in order to 
measure the degree of reuse success (DoD Reuse Executive 
Steering Committee, 1992) . 

Presently, a software reuse metrics plan has been 
developed for DoD; Chapter IV references the contents of this 
plan as part of its discussion on establishing a software 
quality measurement program. 

C. CURRENT DOD EFFORTS 

To establish the foundation for reuse employment, a number 
of federal and civilian agencies have embarked on an effort to 
build "libraries" of reusable software components that can be 
retrieved and integrated into new systems development. The 
success of their effort depends, in part, on two things; 
first, the library must collect reusable components that will 
satisfy the requirements of major library users; and secondly, 
these components must be of the highest quality. As 
Sommerville (Sommerville , 1992) suggests, a successful 
software reuse library must satisfy four customer 

requirements : 

• It must contain software of value to the customer. 

• The software and documentation must be understandable. 

• The customer must be confident using the software. 
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• The software must include information on its reuse. 

Of interest in this thesis is the third requirement. The 
best collection of reusable software is useless unless the 
customer is confident that by using them he will realize some 
quantitative benefit (e.g., increased productivity, reduced 
development costs or improved system quality) . 

Today, all services as well as other select government and 
non-governmental agencies have operational reuse libraries and 
are developing and implementing reuse practices. Among the 
services, all report a reduction in systems development and 
testing time with associated increases in system quality where 
reuse practices have been implemented (Foreman, 1993) . One of 
the key governmental agencies involved in this effort is the 
Defense Information Systems Agency (DISA) . 

In 1992, DISA, with the support of the Joint 
Interoperability Engineering Organization (JIEO) and the 
Center for Information Management (CIM) , established its 
Software Reuse Program to serve as a prototype for the 
DoD-wide reuse initiative (DISA/JIEO/CIM Software Reuse 
Program, 1993) . This thesis will look at DISA's effort to 
support DoD's reuse vision. Specifically, it will discuss 
DISA's software reuse library management and will introduce a 
methodology for the collection and analysis of metrics 
relating to software performance in order to improve library 
software crualitv . 
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While reusable software can include any of a variety of 
programming languages, this study is directed towards Ada 
software components for the following reasons: 

• Ada is the standard DoD systems development language; 
hence, it is the logical language on which to base any 
software quality improvement efforts. 

• DISA has developed specific measurement and certification 
processes for Ada software and has automated tools for Ada 
source code analysis. 



Chapter II continues this discussion. 
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II. DISA's REUSE PROGRAM 



A. INTRODUCTION 

The reasons supporting the development of a library of 
software reusable components were outlined in Chapter I. As 
discussed, DISA along with JIEO and CIM established the 
Software Reuse Program as a means to support DoD's reuse 
initiative. Presently, DISA operates its own reuse library, 
the Defense Software Repository System (DSRS) , which serves as 
a repository of software assets for use by DoD customers. 
Additionally, DISA supports a number of distributed Software 
Reuse Support Centers (SRSCs) which have agreed to provide 
local support for the integration of reuse practices 
throughout DoD (DISA/JIEO/CIM Software Reuse Program, 1993) . 

DISA's main objective for its DSRS is to provide a source 
of reusable software components, called Reusable Software 
Components {RSCs) in DISA terminology, for use by program and 
domain^ managers in their systems development efforts. RSCs 
include products from all phases of the software development 
cycle to include such items as requirements, design and 
testing documentation, as well as source code and manuals. 



^A set of software systems with common- features and 
functionality. The set may be horizontal (e.g. aircraft navigation 
system) or vertical (e.g. radar software) (Ogush, 1992) 
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The process of building a library of reusable components 
begins with a domain analysis study involving major potential 
library users in order to determine the type and identity of 
assets they need. Based on domain analysis results, the 
library starts collecting candidate RSCs, evaluating their 
reusability potential, certifying them and installing them in 
the library for use (DISA/CIM Software Reuse Program, 1993) . 
This process is outlined briefly in the following sections. 
While this discussion is general in nature, specific 
references to Ada RSC processing are made occasionally in 
order to foster a better understanding of the theme of this 
thesis. 

B. RSC PROCESSING OVERVIEW 

Reuse library development begins with a study of customer 
needs. Through domain analysis and liaisons with major 
potential library users, the library determines which assets 
will satisfy user requirements. Once specific asset types are 
identified, library efforts focus on collecting and cataloging 
these RSCs. As previously stated, RSCs include not only 
source code but also supporting design, development and 
testing documentation. 

RSC collection is an ongoing process that relies on both 
government and private industry sources for component 
contributions. Cataloging these components for library use 
involves identifying RSCs with potential for reuse, analyzing 
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their code and documentation, and then certifying them 
accordingly. The net result of the cataloging process is a 
certified RSC ready for induction into the reuse library. 
Because the process of collecting and analyzing candidate RSCs 
is lengthy in nature, it is not discussed in detail here. 
Rather, it is the RSC certification process that is the focus 
of this study. 

C. RSC CERTIFICATION 

Between the collection and certification phase, each RSC 
is analyzed and evaluated to determine if it meets the 
criteria for certification or if it requires re-engineering to 
bring it up to library specifications. If re-engineered, the 
RSC is subjected once again to the analysis and evaluation 
process. Once this process is complete and documented, the 
component is ready for certification where certification 
involves assigning a level of completeness to the RSC based on 
the following schema (DISA/CIM Software Reuse Program, 1993) : 

• Level 1 - Completeness and functionality of RSC are 

unknown. No measures of quality are provided. 

• Level 2 - RSC completeness is assured. Code compiles if 
provided. No testing or user manual given. 

• Level 3 - RSC is complete and complies with reusability 
criteria. Testing occurs and results are provided. 

• Level 4 - RSC is complete, meets reusability requirements, 
is tested and user manual is provided. Highest degree of 
confidence in RSC quality. 
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RSC processing concludes with the installation of the RSC and 
its supporting documentation into the reuse library. While 
DISA's RSC certification process provides useful information 
to the library user, it has its limitations. 

D. CERTIFICATION LIMITATIONS 

As described above, RSC certification levels indicate the 
amount of supporting documentation available for the RSC 
rather than a quantifiable measure of expected quality 
(Merritt, 1993) . For Ada RSCs, some indicative measures of 
quality are derived from the statistical metric data collected 
on the actual software code using the automated analysis tool 
AdaMAT/D. While this type of analysis may provide some 
general indication of quality, a more quantitative approach is 
desirable in order to satisfy customer expectations. 

E . CUSTOMER EXPECTATIONS 

In theory, a customer of a software reuse library expects, 
either implicitly or explicitly, a certain degree of quality 
from a RSC he draws from the library. This idea is embodied 
in the characteristics identifying a successful reuse library 
as outlined by Sommerville (Sommerville, 1992) . While static 
measures, such as those collected by AdaMAT/D, provide a 
general qualitative basis for determining expected quality, 
they are not sufficient to predict software behavior once that 
software is subjected to a variety of domains and operating 



9 



environments. Customer expectations coupled with DoD 
initiatives for the reuse of software in DoD projects make the 
issue of quality a realistic concern. 

As a first step towards meeting customer quality demands, 
this thesis proposes the development and integration of a 
metrics program which will support quantitative quality 
management of reusable software components. Chapter III 
discusses this proposal in detail. 
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III. TOWARDS BETTER QUALITY 



A. BACKGROUND 

Software quality can be discussed primarily in two 
contexts: the process and the product. Dunn and Ullman (Dunn 
and Ullman, 1982) make the observation that in the 1970s, the 
software industry often perceived quality in the context of 
the product, as a post production inspection function; this 
perception still exists in many DoD organizations today. For 
many DoD contractors, the opposite is true; perhaps no single 
factor of software development is given more of their 
attention than quality improvement in the production process. 
Driven by current economic conditions, DoD no longer has the 
luxury of developing and implementing software systems with a 
high probability of defects and associated high maintenance 
costs to fix them. Rather, DoD must now ensure that the 
highest standards of quality are applied throughout the 
production process in order to minimize the risk of poor 
quality in the finished product. 

B. THE QUALITY OBJECTIVE 

It is reasonable to assume that the successful achievement 
of quality improvement in software products lies in the 
ability of DoD and industry to take a proactive role in 
managing the development process. In their text on software 
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reliability, Musa et al . (Musa et al . , 1987) identify three 
user- oriented characteristics related to the software product: 
cost, schedule and quality. As they point out, of these 
characteristics, quality is the only aspect of a product that 
cannot be given a quantitative measure. They go on to suggest 
that reliability is an intrinsic characteristic of quality 
that subsumes many of the other properties normally associated 
with the term quality. 

To the user, software reliability means that a given 
program will operate for a period of time without failure; or 
conversely, that a program behaves as intended for an 
indefinite period of time. Musa et al . believe that, because 
reliability relates to the operation of software, it most 
appropriately supports the user's idea and view of software 
quality. For that reason, they propose that reliability 
measuring: 

• Is customer, rather than developer oriented. 

• Relates to the operation rather than design of software. 

• Accounts for the frequency of problems. 

• Is suitable for predicting trends. 

This discussion concludes by suggesting that reliability 
measurements (e.g., time to failure, failure count) can play 
an integral role in the movement towards the quality 
objectives of the reuse initiative. This role and its 
application in the reuse library environment are discussed in 
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the following section. Note that for the remainder of this 
thesis, the term reliability is used as a specific connotation 
of the more general term quality. 

C. THE REUSE LIBRARY'S ROLE 

Because the reuse library is primarily a repository of 
software components, its part in facilitating quality 
improvement may not be understood. Clarity on this issue may 
be gained by discussing the present and potential roles of the 
reuse library in meeting quality objectives. 

1 . PRESENT ROLE 

As discussed in Chapter II, DISA currently manages RSC 
quality through its component certification process which 
provides the user with only an indication of the component's 
level of completeness. For Ada components, a qualitative 
measure of quality is provided through data collected as part 
of the static analysis done on the code; in any case, the user 
gets no guarantee of the software's behavior once it is placed 
in operation. 

2 . POTENTIAL ROLE 

DISA can expand its present role in providing quality 
software by adding a metrics program to monitor and evaluate 
the reliability of software before and after it is fielded in 
an operational environment. Presently, a number of 
organizations in the software industry have successfully 
developed and implemented programs which support this type of 
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quality analysis. This thesis will draw on the best of those 
methodologies to develop a similar program for the reuse 
library at DISA. 

D. SOFTWARE QUALITY- PROGRAM PROPOSAL 

The purpose of this thesis, as stated earlier, is to 
outline a methodology, based on the successful efforts of 
other organizations such as NASA, for collecting, analyzing, 
and applying operational and metric data from reusable 
software components in order to establish a quantitative 
basis for predicting their reliability. Again, this effort is 
directed towards Ada software with the added limitation of 
applicability to stand-alone, functional RSCs as opposed to 
integrated application programs. 

E. THESIS OBJECTIVE 

The objective of this study is to propose a plan by which 
DISA can broaden the role of software quality management in 
its reuse libraries; this plan includes procedures for 
collecting and analyzing user reliability data on reuse 
components. Ideally, this plan will provide a supplementary 
link to the current library management process that will 
enable library asset managers to better quantify and predict 
the reliability of library software over time. 

As a way of introduction, the idea behind this plan is to 
develop a quality measurement methodology that .will support 
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the capture and analysis of software metrics as well as 
operational data in order to produce some quantitative measure 
of expected quality, or more specifically, reliability. 
Figure 3.1^ provides an overview of this methodology. 




Figure 3.1 Overview of a Software Quality Measurement 
Methodology 



As Figure 3.1 illustrates, at the core of this methodology 
is a reliability knowledge database which serves as a 
receptacle for RSC measurement and operational (defect) data. 
As the reuse library and reliability knowledge database 
mature, software quality managers will be able to apply past 



^Adapted from an illustration by Sheldon et al . (Sheldon et 
al., 1992) 
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experiences with software reliability to the analytical 
processing of new library components. 

As this figure shows, this methodology is not intended to 
be a static, one-time application analysis; this type of 
analysis currently exists. Rather, it will serve as a basis 
for a continual, evolutionary program for improving software 
quality. The development of a quality measurement program to 
support this study's objective is the subject of the remainder 
of this thesis. 
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IV. DEVELOPING A SOFTWARE QUALITY MEASUREMENT PROGRAM 



A. BACKGROUND 

As discussed in the previous chapters, improving the 
reliability of software is an integral part of improving 
overall software quality; and, the key to improving 
reliability is the establishment of a methodology to capture 
and analyze metrics which support quality management. 
Schneidewind^ (Schneidewind, 1993) , in his discussion on the 
methodology of metrics, outlines five steps to follow when 
implementing a software quality measurement program: 

• Define software -quality requirements. 

• Select potential software-quality metrics. 

• Design and implement a metrics plan. 

• Analyze metrics data. 

• Validate original software -quality metrics. 

Schneidewind points out that these steps form the basis for an 
iterative process which involves analyzing and adjusting 
measures as needed. 



^Dr. N. F. Schneidewind has worked extensively in the field of 
software reliability modelling. He is the developer of the 
ichneidewind Software Reliability Model used by . IBM-Houston to 
iredict software reliability for the NASA Space Shuttle flight 
■oftware . 
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David Siefert (Siefert, 1989) , who has researched the 
implementation of software reliability measurement programs in 
organizations, contributes a model of this process. Figure 
4.1 (adapted from Siefert 's model) combines the ideas of 
Schneidewind and Siefert to provide a graphical process 
representation. 




Figure 4.1 Quality Measurement Program Methodology 
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This measurement methodology, which adheres to IEEE 
standards, provides the framework for a discussion of EISA's 
organizational goals and the development of a software quality 
measurement program to support these goals. However, before 
doing so it is necessary to look at EISA's current measurement 
efforts . 

B. EISA'S CURRENT SOFTWARE REUSE METRICS PROGRAM 

To manage its software reuse program, EISA, in concert 
with JIEO and CIM, developed their Software Reuse Metrics Plan 
(EISA/ JIEO/CIM Software Reuse Program, 1993). This plan, 
designed in response to EoE's proposal to define metrics which 
can be used to measure reuse success (EISA/CIM Software Reuse 
Program, 1993), outlines the requirements for identifying, 
collecting and reporting metrics for management analysis of 
the EoE software reuse program. Specifically, it provides 
Project Managers, Eomain Managers, Repository Managers and EoE 
Executives with a process that will enable them to measure and 
manage software reuse in their area of responsibility. 

Eue to its broad scope of application, this plan focuses 
on high level process metrics rather than on the product 
metrics suitable for the software quality analysis focus of 
this study. Therefore, this thesis will use IEEE standards 
and the methodology diagramed in Figure 4.1 to outline a new, 
supplementary program for quality measurement in the reuse 
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library environment. The framework for this program is 
discussed next. 

C . PROGRAM FRAMEWORK 

Figure 4.1 diagrams the methodology for establishing a 
software quality program. The remainder of this chapter 
focuses on the first, three steps in this methodology by 
discussing an organizational strategy for DISA as well as 
software quality requirements and supporting metrics. Chapter 
V continues this discussion by addressing the metric 
collection process. Chapter VI completes this study by 
addressing the analysis and use of operational data as it 
relates to improving software quality in the reuse library 
environment . 

1. ORGANIZATIONAL STRATEGY 

DoD, as part of its organizational strategy to 
implement systematic reuse (DoD Reuse Executive Steering 
Committee, 1992) , called for the establishment of metrics 
collection procedures to measure reuse effectiveness. DISA, 
in support of this DoD directive, developed its software 
metrics plan (DISA/CIM Software Reuse Program, 1993). 
Contained in this plan are the elements of its three-phase 
organizational strategy; these phases, listed in decreasing 
priority, are: 

• Phase I - Focus on developing the reuse library. 

• Phase II - Examine the cost and benefits of library use. 
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• Phase III - Address the technical aspects of reuse. 



Phase III of DISA' s strategy targets the technical 
issues of reuse. For the reuse library manager, this means 
identifying which design quality metrics are most useful in 
providing the best indication of software's suitability for 
reuse (DISA/CIM Software Reuse Program, 1993) . Therefore, one 
element of DISA' s organizational strategy can be stated as 
follows : 

• To improve the quality of the software stored in the reuse 
library. 

Based on this strategy, organizational software quality goals 
can be developed. 

2 . GOAL IDENTIFICATION 

Before software metric selection can be considered, an 
organization must first identify its quality requirements. 
For DISA, a general quality requirement is that its library 
offer RSCs that customers can integrate into their 
applications to reduce costs and development time. According 
to IEEE (IEEE, 1992) , quality requirements should be expressed 
in one of two forms: 

• Direct metric value - a quantitative value which provides 
a direct measure of some characteristic of software quality^. 



^For instance, defect -report -count might be used as a direct 
measure of RSC reliability. 
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• Predictive metric value - a quantitative value used to 
predict some characteristic software quality^. 

For this study, reliability has been identified as the 
characteristic of software quality of interest. Ideally, the 
use of direct metrics is desirable; however, this assumes that 
this type of data is available. In the reuse library 
environment, this idealization is met when a RSC is received 
and has been thoroughly tested to gather this information. In 
other cases, when testing is incomplete or the validity of the 
testing is in question, predictive metrics can be used. 

For the reuse library, direct metrics appear to be the 
most suitable form for expressing quality requirements based 
on the fact that the library's primary role is that of a 
software repository and not a software development facility. 
However, this assumption in no way precludes the use of 
predictive metrics. In fact, as will be discussed later, 
predictive metrics that have been validated can serve as valid 
indicators of software reliability until direct metrics are 
available; application of both metric types is addressed in 
this study. 

As illustrated in Figure 4.1, the development of a 
good metrics program is an iterative process where metrics are 
selected, applied and evaluated for suitability. In the 

^For instance, number -of -statements might be used as a 
predictor of RSC reliability. 
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interest of establishing a baseline for the start of this 
process, the proposed goal for DISA' s reuse library can be 
formally stated as follows: 

• To achieve a zero defect -report -count for library reusable 
software components. 

A prerequisite for achieving this goal is that a particular 
software component has undergone rigorous inspection and 
testing; in reality, a zero defect -report -count is possible 
for a component that was never tested. The next section will 
discuss metrics to support this goal. 

3 . METRICS SELECTION 

Before embarking on metrics selection, it is important 
to first define what a metric is and what it does. Reindollar 
(Reindollar, 1993) suggests that a metric is a tool to be used 
by managers to determine their progress towards meeting a 
specified goal. In the abstract sense this is true but a more 
formal definition is desirable. Recall that a metric can be 
either direct or predictive . While the definition of a direct 
metric is straightforward, a more detailed definition of a 
predictor metric is desirable. 

Schneidewind defines a predictive metric® as a 
function that inputs software data and returns a single 
mamerical result. He uses cyclomatic complexity as an example 



®Schneidewind uses the term "quality metric" in his writings. 
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where the formula M=e-n + 2 is the metric function. In 
this function, e (edges) and n (nodes) represent software 
input data and the resultant value M represents the output. 
The significance of this definition is that the single 
numerical output of the metric function allows the user to 
compare a software component or module against a standard. 
For example, if = 2 for module one and M 2 = 3 for module 
two, there exists a quantitative basis for comparative 
analysis of the two modules if M has been previously validated 
against a quality factor lilce reliability. Finally, provided 
a predictor metric is valid (as will be discussed) , it can 
serve as a substitutive approximation of the desired quality 
characteristic. This feature of a predictive metric is 
particularly desirable during the software development process 
where the characteristics of quality, such as reliability, can 
not be determined until development completion. 
(Schneidewind, 1992) 

With the concept and application of metrics better 
defined, the determination of the metrics suitable for DISA's 
quality measurement program can be made. While any number of 
metrics, such as complexity metrics, can be used as indicators 
of reliability, the following two metrics have been selected 
for purposes of illustration in this study: 

• Direct Metric - defect-report- count (a count of all 
discrepancies related to any portion of a RSC) 

• Indirect Metric - number -of -statements ' (the total 
statement count of the code portion of a RSC) 
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One reason for illustrating the methodology with 
number -of -statements is that, in the final analysis, many 
other metrics have been shown to be highly associated with 
program size. 

As mentioned, more than one metric of each type are 
suitable for use. In fact, Siefert (Siefert, 1989) in his 
research identifies and ranks 15 "Best of Class" metrics based 
on frequency- of -use, importance, ease-of-use and ease-of- 
implementation as reported by the software industry. He 
suggests that an organization developing a quality measurement 
program select two or three of these metrics based on their 
meaningfulness to the organization. A list of these metrics 
is provided in Appendix A. 

Figure 4.2 (adapted from IEEE) depicts the 
hierarchical relationship between software quality, quality 
factors and direct and indirect metrics. 




I EXAXIPLZ • 

* Indlctlrs one or mor« meutirM con ht us^ 

Figure 4.3 Metrics Hierarchical Tree (IEEE, 1992) 
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Examination of this hierarchical tree shows that software 
quality is composed of a number of quality factors where each 
factor has one or more direct metrics representing it. 
Connected with each quality factor is one or more predictive 
metrics which serve as substitutes for direct metrics when 
they can not be used. The reader is directed to IEEE's 
standard on quality metrics (IEEE, 1992) for further 
information on the subject. 

D. METRICS IMPLEMENTATION, ANALYSIS AND VALIDATION 

With suitable metrics selected to support software quality 
management, the next step in the measurement program 
methodology is the implementation and subsequent analysis and 
validation of these metrics. Chapter V continues with program 
implementation . 
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V. PROGRAM IMPLEMENTATION 



A. BACKGROUND 

At this point, the need for improving software quality has 
been identified. Likewise, DISA's role in developing a reuse 
library to support DoD software quality initiatives has been 
discussed. Chapter IV suggested a software quality goal for 
DISA as well as a framework for establishing a reliability 
measurement program to support that goal; this chapter 
addresses the program's implementation. Before continuing, it 
is useful to establish a clear understanding of the 
terminology that will be used. 

B. TERMINOLOGY 

For clarity in this discussion, the following IEEE 
software definitions are provided (IEEE, 1990) : 

• Error - a logical, syntactic or clerical discrepancy 
introduced in the software during the design process. 

• Fault - an unintended functioning of software due to one 
or more errors. 

• Failure - unexpected results from software as a 

consequence of one or more software faults. 

• Defect - for purposes of this thesis, defect is synonymous 
with fault. 

In the software reuse environment, the greatest potential 
for defects exists either during the interfacing or adaptation 
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of reusable software. 



RSCs are similar to commercial off- 



the-shelf software products; therefore, their successful use 
is dependent upon the user's understanding of their interface 
requirements and intended applications. Since the interest of 
this study lies in minimizing the risk inherent in the RSC 
itself, defects related to user misunderstanding of the RSC 
are generally not of interest. The next section continues 
with a discussion on data collection. 

C. THE NEED FOR COLLECTING SOFTWARE DATA 

To support a software metrics program, two types of data 
are needed: metric data relating to the software itself (e.g., 
number -of -statements) and defect data relating to the 
operation of the software (e.g., defect -report- count) . 

For this study, there are two”^ specific reasons for 
collecting software defect data; to support direct metric 
assessment of software quality and to validate the suitability 
of predictor metrics. For example, defect -report -count can be 
used directly by the Software Quality Manager {SQM) to 
discriminate between good and poor reliability software. On 
the other hand, defect -report- count might be used to validate 
the predictor metric number -of -statements so that number-of- 
statements can be used to predict reliability when the actual 



"^A third reason is for high-level, managerial .trend analysis. 
The reader is directed to Florae's report (Florae, 1992) for 
further details. 
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number of defects in a component is unknown. Therefore both 
metric and operational data are needed to provide the SQM with 
the ability to develop more than one means of evaluating 
software quality. 

In concluding this discussion, it is worthwhile to point 
out that Musa et al . (Musa et al., 1987) recommend collecting 
all data, particularly failure data, during software's 
operational phase. This requirement for good data collection 
is outlined in the following sections. 

D. PREREQUISITES FOR DATA COLLECTION 

The previous section outlined the need for data collection 
to support a metrics program. Therefore, the first 
consideration for data collection is to identify the exact 
data requirements to support each metric being used. Other 
considerations are: data collection responsibility, data 
collection tools and data storage (IEEE, 1992) . Each of these 
considerations warrant further discussion. 

1 . DATA REQUIREMENTS 

For defect data . Keller® suggests that an extensive 
database of software defect data is essential for reliability 
analysis (Keller, 1993) . To that extent, he recommends 
collecting at least the following software defect data: 

• Time when failure occurred or was detected. 



®Keller lends extensive experience in performing software 
eliability analysis on Space Shuttle software to this study. 
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• How the failure was found: static analysis, testing or 

operational use. 

• Nature of the failure; code fault or human error. 

• History of the fault introduction that caused the failure. 

• Conditions or environment which triggered the failure. 

• Why the failure was not detected earlier. 

• The effect (severity) of the failure. 

• Configurations or versions of the software affected. 

• Action taken to correct the failure. 

While this list is not inclusive, it does provide a 
framework for initial data collection. In addition to 
Keller's recommendations, Basili and Weiss (Basili and Weiss, 
1984) suggest including the user in the data requirements 
discussions. By doing so, they believe that end-user 
viewpoints and complaints can be acknowledged early in order 
to make them feel a part of the data collection process. 
Appendix B concludes this discussion by providing a suggested 
sample defect report for DISA's use. 

For metric data , requirements are straightforward; for 
each software metric selected, collect all of the data 
relevant to its use. For example, for the metric cyclomatic 
conplexity discussed in Chapter IV, the SQM will want to 
collect information on the number of edges (e) and nodes (n) . 
Data collection for other metrics is done likewise. 
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2 . 



DATA COLLECTION RESPONSIBILITY 



The Reuse Library, which analyzes and re-engineers 
software components, is the logical metric data collecting 
organization. While the argument might be made that the 
software developer is in a better position to collect this 
information, care must be exercised by the library to ensure 
accurate data is collected to prevent GIGO^ . For defect 
data, the library will need to rely on the user for timely and 
accurate collection. Industry practice is to use "Defect 
Reports" for this type of feedback. The reuse library is then 
responsible for providing the user with feedback reporting 
forms to gather this ‘ information. As stated earlier, the 
success of the metrics effort relies, in part, on the 
cooperation of users; this matter can be dealt with as a 
library policy issue. 

3. DATA COLLECTION TOOLS 

As with data collection responsibilities, the choice 
of data collection tools depends on the category of data. For 
software metric data, a number of automated tools exist which 
can collect needed information. Further discussion and 
recommendations on such tools is beyond the scope of this 
study. For defect data, perhaps the best collection tool is 
the defect reports mentioned earlier. For metric data, 
analyzers can capture code metrics during compilation. 

^Garbage In, Garbage Out. 
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4 . DATA STORAGE 



For data storage, some type of electronic storage 
facility is desirable. For example, a database would provide 
a means of storing both metric and defect data on software 
components. Electronic storage can also support data 
retrieval and analysis. As with data collection tools, it is 
the library's responsibility to establish a data storage 
mechanism suitable to its needs. 

E. OTHER CONSIDERATIONS 

Developing a plan is the first step in the data collection 
process. The previous section discussed the elements of such 
a plan. However, the plan is not complete without addressing 
the following considerations. 

1. MISINF0R14ATI0N PITFALLS 

One important requirement in data collection is data 
exclusivity. Florae warns that care must be taken to ensure 
that the data items collected are mutually exclusive of one 
another to avoid duplication in reporting (Florae, 1992) . A 
second requirement is the use of a good, communicative tool. 
As mentioned, a data collection tool (e.g. a defect report) 
must provide an effective means of communication between the 
user and the reuse library. To do this, it must be designed 
in such a way that it is unambiguous, not subject to 
interpretation and not redundant. 
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2. NECESSITY OF DATA VALIDATION 



Basili and Weiss (Basili and Weiss, 1984) found in 
their research on developing a data collection methodology 
that data validation is a necessity for a good data collection 
program. They point out that patterns of mistakes and 
misclassif ications in data reporting become evident as the 
collecting agency begins to synthesize and validate user 
feedback. To counter these inaccuracies they suggest the use 
of interviews with defect reporting activities to clarify any 
potential misunderstandings. As a final note, they warn 
against data entry erbors when automated databases are used; 
such errors will unknowingly skew the data. 

3 . REPORTING PITFALLS 

As is well known in the software industry, quality 
testing only proves that software meets certain criteria; it 
in no way guarantees the absence of defects. The same is true 
of data reporting. As Musa et al . point out (Musa et al . , 
1987) , lack of defect data must be given the same concern as 
its actual presence for the reason that defects often go 
either undetected or unreported. 

Although Appendix B provides a sample defect report, 
this report serves as -only an example and may be subject to 
modification based on its usefulness _ and appropriateness in 
collecting the necessary data to support the reuse library's 
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metrics program. Chapter VI continues by discussing the 
analysis and validation of data, once collected. 
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VI. DATA ANALYSIS AND METRICS VALIDATION 



A. INTRODUCTION 

As Schneidewind points out (Schneidewind, 1993) , metrics 
provide a quantitative rather than qualitative basis for 
evaluating software quality. In its discussion on metrics, 
Chapter IV identified two types of metrics that are of 
interest in this study: direct and predictive. 

To be used effectively, each metric needs an associated 
acceptance criterion (threshold value) which distinguishes 
good from poor quality. For example, predictive metrics are 
collected on software during its development; by comparing 
these metric values to the threshold value, the developer can 
determine whether or not quality development goals are being 
met. Likewise, direct metrics are collected during software 
testing and operation; again, these values are compared to an 
acceptable threshold value to determine final product quality. 
The key then to a successful metrics program lies both in the 
choice of metrics and metric thresholds; metrics and metric 
thresholds are only as useful as their ability to indicate 
whether or not software quality requirements are being met 
(Schneidewind, 1993) . 

While direct metrics serve as unambiguous discriminators 
of software quality, the value of predictor, metrics is 
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initially unknown; therefore, validation of predictor metrics 
is necessary before they are applied. This chapter will 
examine the use of metrics in determining the degree to which 
software quality goals are being met and the process of 
predictive metrics validation. 

B. METRIC DATA ANALYSIS 

A metric data collection program can serve as a valuable 
tool to the software quality manager in the reuse library. 
Predictive metrics can aid the manager in determining which 
RSCs are most likely to be suitable for library use. Direct 
metrics can be used to identify substandard RSCs and to 
support the evaluation of the library's metrics plan. The use 
and interpretation of each of these metrics is presented 
below. 

1. DIRECT METRICS ANALYSIS 

As defined in Chapter IV, the direct metric of 
interest in this study is the defect -report -count for a 
particular software component. As noted, before metric 
analysis can begin, some threshold evaluation criteria must be 
established. While an ideal goal is to achieve a zero 
defect-report -count, a less stringent criteria may be more 
practical; choice of this critical value is left up to the 
organization based on its quality requirements and past 
experience. Further guidance is provided in Appendix A which 
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lists 15 common industry metrics and suitable threshold values 
based on industry experience. 

Once an evaluation threshold is established, a 
software component can be measured against it. For example, 
given candidate RSCs with a defect -report -count of two and a 
pass -fail threshold value of two, a reuse library 
certification team has three options: accept the RSC for 
certification; mark it for re-engineering; or reject it. Or, 
consider the RSC that has been certified and installed in the 
library. As user defect -report -count data is collected, the 
certification team can at some point re-evaluate the component 
based on this data and the established criteria. As with any 
metric, the appropriateness and validity of the direct metric 
being used is important and should be subject to evaluation. 

2. PREDICTOR METRIC ANALYSIS 

As identified in Chapter IV, number -of -statements is 
a sample predictive metric used in this study. While direct 
metrics are used for software product analysis, predictor 
metrics are used during the actual software development 
process itself. As with direct metrics, some threshold 
evaluation criteria must be established for predictive metrics 
before analysis can begin. Once this is done, metrics are 
collected on software at specific intervals during its 
development; these metrics are then compared to the evaluation 
criteria to determine if the software is being developed 
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within the guidelines determined result in good final product 
quality . 

Predictive metrics can be applied in the reuse library 
in a number of ways. For example, take the RSC that has been 
marked for re-engineering. In this case, predictive metrics 
can be applied during the re-engineering process to ensure it 
is designed to meet better quality standards. Another example 
is that of RSCs which are certified at Level One or Two or for 
which no test data is available. Here, predictive metrics can 
be collected and used to provide some quantitative indication 
of the RSC's reliability potential. 

While direct metrics provide an unambiguous measure of 
reliability (either the software has failed or it hasn't), 
predictor metrics can do only that: predict. For this reason, 
the critical step of predictor metric validation is necessary 
in order to establish their appropriateness as reliability 
indicators (Schneidewind, 1992) . The next section discusses 
metrics validation. 

C. PREDICTOR METRICS VALIDATION 

As Schneidewind points out (Schneidewind, 1992) , the 
purpose of metrics validation is to establish a high degree of 
association between a metric and the quality factor it 
represents. Since the predictor metric number -of -statements 
is used to represent the direct metric defect -report -count 
when it is unknown, it is necessary to validate number-of- 
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statements to ensure that there exists a strong association 
between it and defect -report- count. The validation of metrics 
presupposes that two prerequisites are in place before the 
process begins: first, a sound methodology for metrics 

validation; secondly, sufficient data to make the validation 
results reliable. 

The first step in the validation process then is to 
establish a criteria against which metrics can be validated. 
Schneidewind (Schneidewind, 1993) provides the following 
criteria based on IEEE standards^®: 

• Correlation - The variation in defect -report -count must be 
strongly associated with the variation in number-of- 
statements for a given software component. 

• Tracking - A change in defect- report -count must be 
accompanied by a directly proportional and positive change 
in number -of -statements . 

• Consistency - If defect -report -count is rank-ordered for 
a given set of software components, number-of -statements 
for those components must have the same ordering. 

• Predictability - If number-of -statements is to be used as 
a predictor of reliability, it must be able to do so 
within a given accuracy. 

• Discriminative power - Number-of -statements must be able 
distinguish between high and low reliability software. 

• Reliability - Number-of -statements must meet all of the 
following criteria a given percentage of the time: 
correlation, tracking, consistency, predictability and 
discriminative power. 



^°The quality factor and metric examples from this study are 
ised in these definitions for clarity. 
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A given metric does not have to satisfy all criteria. Rather, 
the metric must satisfy those criteria that are related to the 
applicable quality functions (see the forthcoming "Sample 
Analysis" discussion) . 

With a validation criteria established, the validation 
process, as outlined by IEEE's (IEEE, 1992), can begin. This 
process, consisting of drawing a sample of RSC data, 
conducting a statistical analysis of the data and recording 
the results, is discussed in the following sections. 

1. DATA SAMPLE COLLECTION 

As discussed in Chapter V, some means of storing 
software metric and defect data are assumed to be available. 
For example. Chapter III illustrated the use of a reliability 
knowledge database (Figure 3.1) which could serve as an 
information repository from which a representative sample of 
RSC metric and failure data would be drawn. 

2. SAMPLE ANALYSIS 

Once a representative sample of RSC data is collected, 
the next step is to test the data with respect to the validity 
criteria outlined above. First of all, Schneidewind 
(Schneidewind, 1992) defines three quality functions to which 
the validity criteria apply: 

• Quality Assessment - criteria used by software quality 
managers to perform a relative (ranking) comparison of 
software quality in a set of components. 
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• Quality Control - criteria used by software quality 
managers to distinguish components with acceptable and 
unacceptable quality via discriminative value analysis. 

• Quality Prediction - criteria used by software quality 
managers to forecast the quality of components and to flag 
those not meeting requisite standards. 

According to IEEE standards (IEEE, 1992) , a metric may 
be used for quality analysis only in the areas for which it 
passes the validity test. Schneidewind adds that the validity 
criteria against which a metric is tested depends on the 
quality function requirements. For example, a RSC 

certification team may be interested only in validating 
discriminative metrics while the re-engineering team may 
consider validating only predictive metrics. In any case, the 
organization must determine the breadth of metric- to- criteria 
validation. Schneidewind (Schneidewind, 1992) and IEEE (IEEE, 
1992) both provide descriptive examples of tests which can be 
conducted on a metric to prove its validity with respect to 
each of these criteria; the reader is directed to Appendix C 
for further information. 

Metric validation concludes when all requisite tests 
are complete and the appropriate statistical data is 
collected. As illustrated at the conclusion of Step 7 in 
Figure 4.1, at this point the software quality manager can 
evaluate the appropriateness of the original metrics selected 
for organizational use. Metrics having failed or performed 
poorly in some tests can be eliminated and replaced with other 
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more suitable candidates and the process of data collection 
and validation repeated. 

3. RESULT DOCUMENTATION 

At the conclusion of metrics validation, all results 
of the tests on the predictive metrics are recorded. As noted 
before, the entire software quality measurement program is an 
iterative process where predictive metrics are chosen and 
applied to a software project. From there, operational and 
test data is collected in the form of direct metric values. 
These direct metric values are then in turn used to validate 
the suitability of the original predictive metrics. 

While the distinction between good and bad predictor 
metrics can be made on the basis of the validation results, 
the one-time validation of a metric does not guarantee its 
future effectiveness; certain considerations are relevant to 
any metric validation (IEEE, 1992) : 

• Need for re-evaluation - metrics validated in one 
environment or application may not be valid in another or 
if invalidated in one environment or application, may be 
valid in another. 

• Confidence in the validation - as the use of the metric 
increases and the same predictable results are achieved, 
confidence in the metric will grow. 

• Environment - validated metrics should be applied in the 
same environment as which they were validated to ensure 
best predictive ability. 



At this point a reliable set of metrics is assumed to 
exist for an organization. Because this discussion of the 
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validation process was general in coverage. Chapter VII 
provides a case study to illustrate the entire metrics program 
methodology. The final action on the part of the SQM is to 
apply the newly validated metric to a project. A discussion 
of this metrics application is included before closing this 
chapter. 

D. METRICS APPLICATION AND CONCLUSIONS 

As discussed in the metrics validation section above, both 
direct and predictive metrics play an important role in 
managing software quality. Of the two metrics, direct metrics 
are the most useful in discriminating between good and poor 
quality components. In the reuse library environment, direct 
metrics can be collected as the RSC is tested or after it is 
placed in operation by the user. 

Predictive metrics, although less desirable, are useful 
when direct metrics are not known. Through the process of 
validation, predictive metrics can be shown to be reasonably 
accurate in discriminating between good and poor quality 
components. As with direct metrics, predictive metrics can be 
collected in the reuse library during the process of analyzing 
a RSC for certification. At that point, the certification 
team can use the resultant measures to predict if the RSC will 
meet operational software quality standards. If not, the 
component can be re-engineered where predictive metrics will 
be used to guide the process. 
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As mentioned and illustrated throughout this thesis, 
metrics can play a key role in any organization's software 
quality program. Metrics alone are not a solution to the 
reuse quality problem. Rather, they are a tool to be used 
prudently by the software quality manager to manage and 
improve the quality of organizational software. In concluding 
this study. Chapter VII provides an actual example of a 
metrics program in practice. 
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CHAPTER VII. A QUALITY MEASUREMENT PROGRAM CASE STUDY 



A. INTRODUCTION 

Chapter III outlined the need in the reuse library 
environment for a quality management program. Figure 3.1 
provided an overview of how such a program could fit into 
EISA's current operations. Chapter IV then provided a 
methodology for this program and discussed the framework for 
its implementation. Chapter V continued by developing a 
program implementation plan which discussed the requirements 
for data and the means for collecting it. Chapter VI 
concluded with a discussion on the use of software metric data 
and the need for its validation. This chapter presents a case 
study to tie together the discussions of Chapters III through 
VI. 

This case study is based on research conducted by Norman 
F. Schneidewind whose work has been referenced extensively in 
this thesis. Schneidewind was chosen for the reason that his 
theories for software quality measurement have proven to be 
reliable in real-life practices. A primary example is the 
successful application of his measurement methodologies to 
Space Shuttle avionics software in order to measure and 
predict its quality (Schneidewind and Keller, 1992) . The 
remainder of this chapter provides an overview of 
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Schneidewind' s rationale for using software measurements to 
predict quality as well as the means to validate the usage of 
those measurements. 



B. SOFTWARE QUALITY MEASUREMENT PROCESS OVERVIEW 

While Chapter III outlined a quality measurement 
methodology and Chapter VI discussed the uses of metrics in 
analyzing and predicting quality, this chapter provides a 
detailed discussion of the "why" and "how" associated with 
software quality measurement. 

Schneidewind, in his discussion on metrics validation, 
provides a model of the process and a description of the terms 
used in metrics collection and validation; Figure 7.1 
illustrates this model. 
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Figure 7.1 Metrics Validation Process (Schneidewind, 1992 ) 



As shown in this figure, a metrics' collection process is 
divided into two phases on two time-lines; phase lines divide 
metrics collection (Phase I) from metrics validation (Phase 



46 



II); time-lines divide projects. At on Time-line I, a 
software project is measured and predefined metrics (M) are 
collected. At some point in time (T 2 ) on the same time-line, 
quality factor values (F) are collected from operational and 
test data on project P^. At this point, F and M are tested 
against some validation criteria to determine if a suitable 
association exits between them. In essence, the objective is 
to see if M is in some way related to F. With M validated, a 
new project P 2 is entered into on Time-Line II. Once again, 
at Ti metrics M' are collected. Note that M' is the same 
metric as M only with new and different values. This time, M' 
is used to assess, control or predict the quality of the P 2 as 
it is being developed. Again, at some point in time (T 2 ) , 
quality factor values 7' are collected. This time M, K.' , F 
and F' are all subjected to a validation process in order to 
reevaluate the usefulness of the original metrics (M) . 
(Schneidewind, 1992) 

The following sections provide an example of this process 
being applied to a real-life system. 

C. CASE STUDY 

1. INTRODUCTION 

The basis for this study is' research conducted by 
Norman F. Schneidewind on the use of metrics on Space Shuttle 
software (Schneidewind, 1994) . The purpose of this research 
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was to show that it is possible to collect and validate 
metrics which can be applied to future software projects to 
predict and control quality. In particular it proves that it 
is possible to statistically demonstrate an association 
between metrics and quality factors in order to support the 
premise that quality can be controlled in design by confining 
software metrics to certain critical value parameters. 

The objective of the study is to find some 
relationship between metrics and quality factors that will 
enable the software quality manager to predict the quality of 
large-scale projects . In particular, it seeks to develop two 
types of design quality management tools; Boolean 
Discriminator Functions to control software quality and 
Regression Equations to predict future discrepancy report 
counts (Schneidewind, 1994) . The methodology for meeting this 
objective is discussed below. 

2 . METHODOLOGY 

The first step in this process is the application of 
metrics to a project to support quality functions as outlined 
by IEEE (IEEE, 1993) and Schneidewind (Schneidewind, 1992) . 
Next, it is necessary to try and establish some relationship 
between a quality factor and one or more selected metrics. 
The identification of such a relationship is critical in order 
to develop the discriminator values and functions needed to 
control software quality. The final step is the application 
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of non -parametric statistical techniques to candidate metrics 
in order to identify those values that best support quality 
control. (Schneidewind, 1994) 

Before continuing this discussion, two matters of 
importance need to be mentioned. First, in his approach to 
this study, Schneidewind found that data smoothing was 
required in order to rationalize the mass of data. Secondly, 
through follow-on research, Schneidewind was able to develop 
regression equations suitable for supporting quality 
prediction. 

The foundation of this research lies in the 
application of two models: a Discriminative Power Model to 
identify quality control metrics; and a Prediction Model to 
identify quality prediction metrics. Each of these models and 
their application is discussed in the following sections. 

D. DISCRIMINATIVE POWER MODEL 
1 . INTRODUCTION 

As defined in the objective statement, the purpose of 
this study is to determine if sufficient relations exist 
between select metrics and a quality factor to enable the 
Software Quality Manager (SQM) to use these metrics to predict 
quality during the developmental phase of future software 
projects. The idea is to provide the SQM with some tool for 
managing quality control. The pre-requisite for the 
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application of this model is that sufficient software metric 
and quality factor data has been collected to provide a 
representative sample upon which statistical analysis can be 
performed. Ideally, data from a large number of software 
modules is desirable; in this case, data from 1489 modules 
used for Space Shuttle flight control was available and used. 

The intended benefit of applying this Discriminative 
Power Model is the identification of one or more metrics with 
associated quality criteria for use by the SQM (Schneidewind, 
1994). The following section describes the Discriminative 
Power Model, and the discriminative power identification and 
validation process. 

2 . MODEL PURPOSE 

The Discriminative Power Model is used to determine if 
sufficient relationship exists between some metric M and some 
quality factor F to allow the SQM to apply M to future 
software development projects. As Schneidewind points out, a 
metric's validation process validates a metric with respect to 
one of the previously defined validity criteria (e.g., 
association, consistency, etc...) where each of the validity 
criteria support one or more of the three quality functions: 
quality control, quality assessment and quality prediction 
(Schneidewind, 1992) . Therefore, application of this model 
is desirable in order to identify some critical metric value 
and critical quality factor value such that can be 
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used to discriminate between modules that are above or below 
some threshold. (Schneidewind, 1994) 

For purposes of this study, it is desirable to 
identify some that the SQM can use as an indirect measure 
of when is not available. For instance, consider a RSC 
from which metric M is collected. can be used to evaluate 
that RSC in order to establish its potentially good (M s M^) 
or potentially bad (M > Af^) quality. 

3. MODEL DEFINITION 

The principle tool used to validate with respect to 
F^ in the Discriminative Power Model is Contingency Table 
Analysis and the chi-square (x^) criterion outlined by Conover 
(Conover, 1971) . Other validity criteria include module 
misclassif ication, required 'inspections and product quality. 
(Schneidewind, 1992) 

Table 7.1 illustrates a typical contingency table and 
will be used to aid further discussion of Contingency Table 
Analysis as part of the Discriminative Power Model. 

TABLE 7.1 CONTINGENCY TABLE 





M - 


K > 


F < F^ 


^11 


Ci2 

Type II Misclassiric.ations 


F > F^ 


^21 

Type I Misclassifications 


^22 
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It is important to note that during this stage of metrics 
validation, all metric and quality factor data are available 
thus enabling the use of the contingency table. Therefore, 
using the table's criteria, all modules of interest can be 
classified in one of four categories based on whether their M 
value is s or > and their F value is s or > F^. 

Note that divides modules into two categories; those with 
M > Wc are considered to be potentially poor in quality and 
should be examined; those with M < are considered 

acceptable. Metric M is then validated by demonstrating that 
it is able to divide the table such that and C 22 are 

relatively larger than C_j 2 and C 21 (Schneidewind, 1992) . 

For the perfect discriminator (Af^) , ^12 ~ ^21 - 

However, perfect discriminative metrics are seldom found; 
thus, other statistical methods such as Chi-Square Contingency 
Table are used to determine to what degree serves as a 
perfect discriminator (Schneidewind, 1992) . Other uses of the 
contingency table for metrics validation are explained below. 

a. MISCLASSIFICATION 

Two indicative measures of a metric's 
discriminative ability are the number of Type I and Type II 
Misclassif ications it allows to occur (Schneidewind, 1992) . 
In the Table 7.1, a Type I Misclassif ication occurs when a 
module containing more than a desired number of errors is 
improperly categorized as acceptable. Conversely, a Type II 
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I 

Misclassif ication occurs when a module containing an 
acceptable number of error is categorized as unacceptable. 
Thus two measures of a metrics discriminative potential that 
;i can be drawn from the table are defined as (Schneidewind, 

i 

I 1994) : 



n 




(number of modules) 




P, - 


C:, / n 


(Percentage of Type I misclassif ications) 


(1) 


P: - 


C;j / n 


(Percentage of Type II misclassif ications) 


(2) 


Pj2 = 


(QcQ,) 


/ n (Percentage of Type I & II misclassif ications) 


(3) 



Jb. INSPECTION 

Another estimate of the discriminative power of a 
metric M with respect to quality factor F is the proportion of 
modules inspected and the portion that is wasted inspected 
(Schneidewind, 1994) . This is explained by again looking at 
the Table 7.1. Here, all modules with M > are subject to 
inspection. As discussed in the above section, a number of 
those modules are improperly classified and thus represent 
wasted inspection efforts. Therefore, these added measures 
are defined (Schneidewind, 1994) : 

J = (Qj + C„) / n (Percentage of modules inspected) (4) 

RI = Qj / Cj2 (Ratio of useful to wasted inspections) (5) 

C. QUALITY 

A final estimate of a metric's discriminative power 
is the proportion of remaining quality factor values (e.g.. 
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defect-report-count) in modules not inspected (Schneidewind, 
1992) . This measure is found by summing the F count for all 
modules not inspected and dividing it by the total beginning 
F count from all modules. Hence, these final measures are 
given (Schneidewind, 1994) : 

RF (sum of F for modules not inspected) 

TF (sum of F prior to inspection) 

RFP = RF / TF (percentage of F left after inspection) (6) 

RFD = RF / n (density of F left after inspection) (7) 

RMP = C21 / n (percentage of modules after inspection with F>0) (8) 

Having identified a model for metric validation defined, the 
next section focuses on the validation process itself. 

4. THE ANALYTICAL PROCESS 

The analysis process discussed here is based on the 
actual research conducted by Schneidewind on Space Shuttle 
software. Schneidewind, in his paper (Schneidewind, 1994), 
identifies and defines the metrics and quality factor shown in 
Table D-1 in Appendix D. These metrics and factor data were 
collected from 1489 flight software modules. 

The first step in this analysis process is the 
selection of candidate metrics to be tested against the 
validity criteria to determine their potential for 
discriminators and predictors of quality. Initial scatter 
diagrams and histograms often provide a general indication of 
correlations among data. However, in this case, neither tool 
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provided any conclusive results. Instead, Principle 

Components and Factor Analysis were used to support 
preliminary candidate metric selection. (Schneidewind, 1994) 
Definitions and descriptions of both Principle 
Components and Factor Analysis procedures are outlined below, 
a. PRINCIPLE COMPONENTS ANALYSIS 

The objective in Principle Components Analysis is 
to identify a few weighted combinations of metrics that: are 
independent; account for the greatest variation in the 
metrics; and have a high correlation with the given quality 
factor (Schneidewind, 1994) . Figures D-1 and D-2 in Appendix 
D show the results of applying component analysis to the 13 
metrics and quality factor identified in Table D-1. 

In analyzing Figure D-1, the notion is to identify 
components that have a high value with respect to one 
component line (e.g.. Component 1} and a low value with 
respect to the other on the other (e.g.. Component 2) 
(Schneidewind, 1994) . In this case it appears that stmts and 
nodes have the highest values along the Component 1 line (.319 
and .311 respectfully) and low values along the Component 2 
line (.12 and -.228 respectfully). A similar analysis 
technique is applied to Figure D-2. 

In Figure D-2, lines from the origin to a metric 
represent the metric's contribution to a principal component. 
Here again stmts and nodes have the high values along the 
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Component 1 line and relatively low values along the Component 
2 line. One analytical feature of this graph is the fact that 
the angle between any two metric lines is inversely 
proportional to the correlation between them (Schneidewind, 
1994) . Application of this feature to Figure D-2 indicates an 
apparent strong correlation between stmts and drcount. 

Principle Components Analysis provides one tool for 
metric assessment; additionally, Factor Analysis was used to 
support Principle Components Analysis findings. Subsequent 
application of Factor Analysis confirmed stmts and nodes as 
suitable metrics for validation testing. (Schneidewind, 1994) 

b. CONCLUSIONS 

In the analysis process, 13 metrics and one quality 
factor were analyzed. Principle Component and Factor Analysis 
of these metrics suggested that stmts and nodes were the most 
suitable of the 13 metrics for potential validation. 
Additionally, edges, maxpath and avepath appeared to be viable 
contenders; future validation efforts can be expanded to 
include them. (Schneidewind, 1994) 

Based on the above conclusions. The metrics stmts 
and nodes along with the quality factor drcount will be the 
subjects of interest for the remainder of this study. 
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5. DISCRIMINATIVE POWER VALIDATION MODEL APPLICATION 



The result of the initial search for candidate metrics 
concluded that stmts and nodes appeared to be the best choice 
for purposes of this study. Further analysis, which compared 
drcount to stmts and nodes using the graphical representations 
of histograms, revealed a clustering of data points for low 
values of both the metrics and cpaality factor indicating that, 
if the critical value of drcount is low, then the associated 
critical values of stmts and nodes will be low (Schneidewind, 
1994). Figure D-3 in Appendix D illustrates the plotting of 
the unsmoothed data points of stmts verses drcount; a similar 
plotting of data for nodes verses drcount yielded the same 
results. From the figure, it is evident that data smoothing 
was necessary to extract useful information from the data, 
a. DATA SMOOTHING 

As Figure D-3 illustrated, little association is 
evident between stmts and drcount before data smoothing 
occurs. In order to refine the data, 92 of the 1489 initial 
modules were removed from analysis due to the fact that they 
contained a zero stmts count^^. Data smoothing was then 
performed on the remaining 1397 modules by dividing the 
modules into 12 statistical classes representing 97.7% of the 
modules. Table D-2 in Appendix D shows the range and standard 



^^These modules contained assembly code which is not counted 
as statements for this project. 



57 



deviation for each of the remaining modules. Figures D-4 and 
D-5 show the plots of avestmts and avenodes verses avedrcount 
after data smoothing is performed (Schneidewind, 1994) . 

At this point, the analytical process can begin. 
Here, using contingency table analysis and the equations (l) 
through (8) , Table D-3 is produced where the values for D^, S,, 
and are derived from the first three classes in Table D-2 
using rounded-down average (Ave) values. For example, from 
row one, average drcount = 0, average nodes = 9 and average 
stmts = 8 when rounded down to achieve whole numbers. 

Additionally, note that where values appear for both S^, and 
Nj,, these two metrics are applied compositely using the OR 
function. With Table D-3 defined, the statistical validation 
of stmts and nodes can be carried out . 
b. STATISTICAL VALIDATION 

This step in the model focuses on validating the 
selected metrics statistically. Here, Chi-square Analysis is 
applied where a high chi-square value and corresponding low 
significance values are desirable. In Table D-3, x^c^X^s a-iid 
(x^g = 10.83, oig = .001 and = 0 to five places) for 

all cases; this provides sufficient statistical validation for 
all functions in the table (Schneidewind, 1994) . Note, that 
the function (S^ = 8 OH = 9) produces the highest chi- 
square of any single or combined metric pair application. 



58 



In addition to statistical validation, validation 
by application is desirable for both stmts and nodes. This 
process is outlined next. 

C. METRIC APPLICATION 

At this point, the metrics stmts and nodes have 
been statistically validated; therefore. Table D-3 can now be 
used by the SQM to support quality management decisions. For 
example, if the SQM were interested in high-quality, high- 
inspection requirements, then the choice of small values for 
and would be appropriate. Conversely, for lower quality 
and lower inspection requirements, larger values for both 
metrics would be needed. Additionally, by using S^, and in 
combination using the OR function, better results are possible 
then if the two metrics are used singly; Figure D-6 
illustrates the effectiveness of this combination. Therefore, 
by varying the values selected for S^. and N^,, the SQM gains 
flexibility in the selection and application of software 
inspection requirements. One final issue to address is the 
tradeoffs inherent in metrics choice and use. 
d. CONCLUSIONS 

As mentioned above, choice of critical metric 
values (M^) provides the SQM with the ability to modulate both 
quality and inspection requirements. Often there exits a 
tradeoff between quality requirements and the cost of module 
inspections (Schneidewind, 1994) . For example, if the cost 
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associated with inspections were no object, the SQM could 
inspect all modules with (S > 8 Oi? N > 9) (Table D-3) 
resulting in 67.9% of all modules being inspected. 
Conversely, if budgetary constraints limited inspections to 
50% of all modules and a RPP of 18.1% was acceptable, only 
modules with (S > 48 OJ? N > 21) could be inspected. Finally, 
as discussed earlier, the process of metric selection and 
validation is an iterative one where no single metric is 
considered permanently valid. Rather, as more software 
development and operational data are collected, current 
metrics should be subjected once again to the re -validation 
process. The next section introduces a power model for 
metrics validation that focuses on the predictive, vice 
discriminative, abilities of a metric. 

E. METRICS PREDICTABILITY MODEL 
1 . INTRODUCTION 

The discriminative power validation process is useful 
in proving that select metrics have a strong correlation to 
some quality factor. Another desirable feature of metrics, 
with respect to software quality measurement, is the ability 
to predict software quality in the absence of quality factors 
(Schneidewind, 1992) . For example, if the number of 
discrepancies in a software module is unknown, the SQM would 
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like to be able to use stmts or node counts to predict the 
expected discrepancies in that module. 

While the discriminative property of metrics applies 
to the segregation of low from high quality software modules, 
the predictive property applies to the use of prediction to 
estimate a particular module's behavior relative to some 
unknown quality factor. The following sections discuss the 
process of validating metrics for use as quality predictors. 

2. PREDICTABILITY CRITERIA 

In order for a metric M to satisfy the predictability 
criteria it must meet the following condition (Schneidewind, 
1992) : 



°-T2 Pt2 I ^ Q 
_“T2 



(9) 



In essence, for some function f (M) using metric M' (remember 
that M' = M) collected at time T1 (refer to Figure 7.1, Time- 
line II), f (M) must be able to predict the quality factor 
at time T2 with an accuracy of /3^. Figure 7.2 provides a 
graphical depiction of the variance criteria for f (M) . As 
this figure illustrates, in the ideal situation, f (M) = F„; 

however this is seldom the case. Instead, metric M' and 
function f (M) are acceptable if f (M) falls within the 
tolerance of F,±. (Schneidewind, 1992) 
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Fp+ Imperfect Predictability 



F f(M) 



Fqj Perfect Predictability 




T1 



T2 



Application Project Time 



> 



Figure 7.2 Predictability Criterion for f (M) 
(Schneidewind, 1992) 

The following' sections continue this discussion by 
presenting the analytical portion of the Metrics Power Model. 

3. STATISTICAL ANALYSIS 

As a result of discriminative power modelling, stmts 
and nodes were identified as two potentially useful metrics. 
Figures D-4 and D-5 (Appendix D) show that avedrcoimt exhibits 
a. linear relationship with respect to avenodes and non-linear 
relationship with respect to avestmts. Additionally, Figure 
D-7 extends this discussion by illustrating that avedrcoimt 
exhibits a non-linear relationship with respect to the 
combination avenodes and avestmts. These relationships 
provide the basis for -regression analysis. 
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4. 



REGRESSION MODEL DEVELOPMENT 



Based on the relationships between drcount and 
avenodes and avestmts outlined above, regression analysis was 
performed on the metrics stints and nodes using the data from 
the 12 classes in Table D-2 to derive the following regression 
models (Schneidewind, 1994) : 

Dj^(s) = exp(.242 + .00523S3) (10) 

D,(n) »= - .262 + .0658N„ (11) 

o ol 

D^(sn) = exp(.348+.00194S^+.00826N3) (12) 

using the following notations: 

: avestmts used to produce D^{s) and D3(sn) or given 

value in D, (s) and D^(sn) used as predictors 

: avenodes used to produce D^Cn) and D3(sn) or given value in (n) 

and D^(sn) used as predictors 

: avedrcount used to produce D3(s), D^(n), and D, (sn) 

Dg^(s) : predicted avedrcount as a function of avestmts 

Dg^ (n) : predicted avedrcount as a function of avenodes 

Dg^(sn) : predicted avedrcount as a function of avestmts / avenodes 

D_ ' : actual avedrcount 

Figures D-8, D-9 and D-10 plot the regression analysis 

equations (10) , (11) and (12) against actual data from the 

modules. Although these plots demonstrate a fairly good fit 
between the actual and predicted values, further statistical 
validation is desirable. 
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5 . 



PREDICTABILITY VALIDATION MODEL APPLICATION 



At this point, the predictability validation model is 
applied in order to validate the degree to which the selected 
metrics can predict quality. Using the following equations 
(Schneidewind, 1994) : 



MRE= 



C 





C = Number of Classes (13) 



k=l 



( 14 ) 



k-1 



Table 7.2 is produced: 



( 15 ) 



TABLE 7.2 PREDICTABILITY VALIDITY CRITERION 
(Schneidewind, 1994) 





MRE 


MRE SD 


MSE 


MR 


MR SD 


D^(s) 


.247 


.301 


1 . 104 


.0151 


1.097 


Da (n) 


.127 


. 117 


.281 


- .0000768 


.554 


D^ (sn) 


.192 


.388 


. 198 


- . 0300 


.463 



MRE: Mean Relative Error 

MRE SD: MRE Standard Deviation 
MSE: Mean Square Error 

MR: Mean Residual 

MR SD: MR Standard Deviation 
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Three evaluators of the goodness of fit of Dg^(s), D^(n) and 
D^(sn) with respect to the actual data (shown in Figures D-8 - 
D-10) are: MRE, MSE, and MR (Schneidewind, 1994). Here, MRE 

measures prediction error re.lative to avedrcount. MSE helps 
by minimizing the sum of the variance and square of the bias 
of avedrcount. Finally, MR provides a measure of the observed 
verses the predicted values of drcount, without consideration 
for sign. 

In analysis, residual plots are useful for 
demonstrating whether or not there is stability in predictions 
(Schneidewind, 1994). An examination of Table 7.2 reveals 
that there is no cleax winner in all categories. For that 
reason, and because: 

• Several predictors are more desirable than one. 

• Often stmts is the only metric available early on in the 
software development cydle. 

• It is desirable to revalidate all predictors using other, 
new data. 

the predictor functions Dj^(s), (n) , and Dg^(sn) are all 

considered useful and can be used in predicting software 
quality (Schneidewind, 1994) . The next section describes the 
application of these functions to software projects. 

6. PREDICTOR METRIC APPLICATION 

Subsequent to the selection of predictor functions is 
the re -application of these functions to the project from 
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which they were derived in order to measure their validity. 
By obtaining random samples from the 1387 modules of upper and 
lower limits of stmst and nodes for three cases and computing 
the predicted and actual drcount, Schneidewind was able to 
construct Table 7.3: 

Table 7.3 SAMPLE OF PREDICTIVE METRICS APPLICATION 
(Schneidewind, 1994) 



Sa 


Na 


Da' 


DJS) 


Da(n) 


Dg (sn) 


264.39 


142 . 17 


5.87 


5.09 


9.09 


7.66 


312.24 


132 . 62 


7.32 


6 . 52 


8.46 


7.76 


167.08 


83.88 


3.00 


3.05 


5.26 


3.92 



From this Table it is apparent that D^ (n) failed in all three 
cases as a good predictor of actual drcount. On the other 
hand, Dj^(s) and the metrics combination Dg(sn) demonstrate 
favorable predictive abilities. While many more such tests 
are necessary before the results can be conclusive, this 
example illustrates the value and variability of predictor 
metrics in actual application. 

In concluding this discussion it is noted that 
validated predictor metrics are suitable in three applications 
by the SQM (Schneidewind, 1994) : 

• When metrics for software modules are available and it is 
desired to form some prediction of the effect of those 
metrics on the module's quality. 

• When metrics are available and it is desired to predict 
the effect of changes in design on the module's quality. 

• When it is desired to use predictor metrics in the actual 
design process before coding begins. 
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As with discriminative metrics, predictor metrics 
provide the SQM with yet another tool by which quality 
analysis and prediction can be performed on software modules 
in the absence of any quantitative quality factor data. 

F. CONCLUSIONS 

Based on the research in this case study, the use of 
metrics to predict quality was successful (Schneidewind, 
1994) . It was found that Boolean OR functions could be 
developed for metrics to serve as discriminators of quality. 
It was also found that two metrics, when used together, might 
be better discriminators of quality than one metric alone. 
Finally, it was shown that regression equations can be 
developed which can serve as predictors of quality. 

Of significant importance is the role that data smoothing 
played in this research; without data smoothing, these results 
would not have been achieved. While further research is 
necessary to continually improve and validate the process of 
metrics use, this study illustrates a starting point for such 
an effort. 
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CHAPTER VIII. CONCLUSIONS 



A. THE COST OF QUALITY IMPROVEMENT 

One argument with respect to improving software quality in 
the reuse library is that of its associated costs. In 
general, the DoD executive expects a significant return in 
terms of time or cost savings for resources dedicated to 
software quality improvement. 

The response to this argument lies in the examination of 
the purpose of a quality measurement program. A quality 
measurement program provides the SQM with a resource 
management tool. For example, a SQM can apply select^^ 
predictor metrics to software components in order to 
distinguish the potentially good from bad. Cost savings are 
realized in this case when the SQM is able to minimize the 
wasted testing and inspection of good components and instead 
redirect critical resources to only those potentially bad 
components . 

The major costs associated with the proposed quality 
improvement program lies in the establishment of a reliability 
database to support metrics evaluation. In many organizations 
(e.g., NASA), developing such a database involves significant 



^^Earlier discussions in this thesis outlined the process of 
metrics selection and validation. 
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time and money to test and document numerous software 
components. In this case, the reuse library has a particular 
advantage; it can rely on its users to test library components 
as part of their systems development. The reuse library need 
only then develop and maintain a record of user feedback on 
these testing results. Notably however, the success of this 
effort depends on willing cooperation and reliable feedback 
from users. 

In any case, quality gains will not come without costs. 
Executive level managers should consider the long term 
benefits of investing in software quality improvement now. 

B. CHOICE OF METRICS 

Defect-report-count and number -of -statements were used as 
examples of metrics in this thesis; in practice, any number of 
metrics can be used. In fact, it is particularly desirable to 
start a quality measurement program with a number of candidate 
predictor metrics. Then, during the metrics validation 
process (Chapter VI) , those metrics found unsuitable for 
quality prediction purposes can be eliminated. 

As mentioned. Appendix A provides 15 sample metrics 
generally used in industry today. Additionally, IEEE (IEEE, 
1992) lists a number of both direct and indirect metrics and 
includes a meaningful example (Annex C) illustrating the 
entire measurement methodology. Finally, Chapter VII provides 
a case study that outlines the process using the original 
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metrics that NASA selected for its Space Shuttle Software 
improvement effort. 

Ultimately it is up to the organization to select and 
validate metrics suitable for their needs. Likewise, it is up 
to the organization to determine the critical value thresholds 
to which these metrics will be held. For example, a SQM may 
determine that a defect -report -count threshold of two is 
acceptable for information system type RSCs . On the other 
hand, the SQM may set a threshold value of zero defect-report- 
counts for flight critical type RSCs. 

C. METRICS AS "THE" QUALITY IMPROVEMENT SOLUTION 

As Stated earlier in this thesis, a metrics program is not 
the solution to the DoD's software development problems. 
Rather, a good metrics program can play a supporting role in 
building a library of quality reusable components. To support 
DoD systems development, the reuse library begins by 
identifying the types of RSCs that its users need. Next, it 
collects RSCs of this type with the goal of developing them to 
a Level 4 status that is most valuable to the user. Finally, 
the SQM can focus on improving the quality of library assets. 
To that extent, metrics will play an important roll by 
providing the SQM with a tool to distinguish quality among 
components and to aid in determining the allocation of 
resources for RSC quality improvement. 
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D. SUMMARY 



The author of this thesis does not suggest that the 
quality improvement methodology presented here is the only 
solution; rather, this is one methodology that can be used to 
support quality improvement. The author does believe that 
quality improvement measures are an important element in the 
development of a repository of high quality software 
components that can be used by systems developers to reduce 
their development time and costs. 

The methodology presented here has been shown to work and 
can be readily applied in the reuse library environment. 
While acknowledging the costs associated with quality 
improvement, there is perhaps a greater cost associated with 
ignoring it. A real example of such danger lies in the 
history of software development methodologies. 

In its infancy, software was often developed in an ad-hoc 
fashion without any formal methodology. Today, many years 
later, the software industry is still suffering the 
consequences for not having the foresight to develop and 
implement standard software engineering practices in the early 
days of software development. DoD today, starts a new era 
with the development of software reuse libraries. Component 
quality needs to be given proper attention now while the reuse 
initiative is in its infancy and provides a suitable ground- 
level entry point for any quality improvement effort; later 
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on, after huge repositories of software have been built may be 
too late to achieve quality improvement. 
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APPENDIX A 



"BEST OF CLASS" MEASURES 



A. INTRODUCTION 

Siefert conducted one and one -half years of research on 
industry's use of software measures. To collect his data, he 
polled 350 organizations world-wide to determine the extent of 
their use of 39 industry- software measures defined by IEEE (IEEE, 

1988) . He concluded his work by identifying 15 "Best of Class" 
measures evaluated on the criteria of their importance as well as 
their frequency and ease of use. Along with statistical preference 
data, he gathered information on the ways in which these measures 
are used and the standards by which they are evaluated. (Siefert, 

1989) The next section discusses his recommendation for applying 
the results of his study. 

B. APPLICATION 

While all 39 of the IEEE measures are of importance, Siefert 
points out that he omitted a number of measures for which he 
received no significant response from his top 15 list . He goes on 
to suggest that the remaining 15 measures, which form a normalized 
composite of his research, provide a reasonable, low- risk starting 
point from which an organization can begin to build its measurement 
program. Siefert supports the methodology outlined in this thesis 
for establishing a software quality measurement program. In 
following this methodology, he suggests that an organization define 
its goals and then select and implement two or three of his "Best" 
measures using IEEE writings (IEEE, 1988) for guidance. He 
concludes by commenting that while his measures provide a 
statistical basis for metric selection, measures and standards 
selection should be driven by organizational experience and current 
technology. (Siefert, 1988) 

The following tables are reproductions of the findings of 
Siefert with a few modifications in the interest of brevity and 
conserving space. The reader is directed to Siefert 's work 
(Siefert, 1988) for a more detailed explanation of his research and 
findings . 
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TABLE A-1 "BEST OF CLASS" MEASURE'S USAGE 



RANK 


MEASURE 


MEASURE USAGE 


1 


Fault Density 


- Predict the remaining faults and system availability 

- Evaluate faults per N source lines of executable code 

• Applicable to predictive reliabili^ models 


2 


Failure Rate 


- Evaluate failure rates to operation time 

- Indicator of quality of software 


3 


Error Distribution 


- Show correlation between module length and error 
distribution 

- Indicate need for further testing 


9 


Defect Density 


- Measure reliability growth 

- Define defects per number of executable source LOC 

- Applicable to predictive reliability models 


5 


Cumulative Failure 
Profile 


- Indicate software quality 

- Applicable to predictive reliability models 

- Supports measure # 1 1 


5 


Failure Analysis 


- Measure software quabty 


6 


Test Coverage 


- Determmc quality of testmg 

- Evaluate lest coverage adequacy 


7 


Fault Days Number 


- Used for release decisions 


8 


Cyclomatic Complexity 


- Used to estimate minimal cases 

- Show maintainabiljty/lcstabiJity' 

- Determine complexity 


9 


Entries and Exits 


(not available) 


12 


Functional Test 
Coverage 


- Used for release decisions 


11 


Mean Time to Failure 


- Indicator of quality of softu are 

- Calculated from slope extracted from graph of measure ^ 5 


12 


Halstead- Software 
Science Difficulty 


- Determine latent defects content 

- Deterrmne software size and complexity 


13 


Graph -Theoretic 
Complexity 


- Used to determine where system level testing should concentrate 


14 


Source Listings and 
Documentation 


- Used as pan of mspecuon process 


15 


HW/SW Operational 
Availability 


- Project systems availability 
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TABLE A- 2 



"BEST OF CLASS" MEASURE STANDARDS 



RANK 


MEASURE 


STANDARD 


1 


Fault Density 


- Raicigb distribution and historical data versus set goals 

- Faults counted per N LOC 

- Standards specific lo project 


2 


Failure Rate 


- Failure versus execution time 

- Project specific specifications 

- Better than past results 


3 


Error Distribution 


- Normal distribution (desirable, but usually not found) 


4 


Defect Density 


- Defects counted per N LOC 

- Less than .1 defect per 1,000 lines of code 


5 


Cumulative Failure 
Profile 


- Parabolic shape and flattening over time 


5 


Failure Analysis 


(not available) 


6 


Test Coverage 


- Must exceed Z07t 

- 100% of non-reused code tested 


7 


Fault Days Number 


(not available) 


8 


Cyclomatic Complexity 


- 10 or less (realistically, application dependent) 


9 


Entries and Exits 


(not available) 


10 


Functional Test 
Coverage 


- 100% of functions tested 


11 


Mean Time to Failure 


- Better than past results 

- Minimum 2,000 hours (system requirements dependent) 


12 


Hal stead -Software 
Science Difficulty 


(not available) 


13 


Graph -Theoretic 


- Comparison among other similar programs and past error 




Complexity 


history 


14 


Software Source 
Listings and 
Documentation 


(not available) 


15 


Combined HW/SW 

Operational 

Availability 


- Customer specifications and reliability growth function 
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SAMPLE SOFTWARE DEFECT REPORT 

The attached pages serve as a sample software defect report 
for use in gathering discrepancy data on software reuse modules. 
The information contained in these reports was compiled from 
several sources. Keller (Keller, 1993), who manages and 
coordinates Shuttle software, provides insightful information on 
the defect data that IBM collects as part of its software quality 
management program. Florae (Florae, 1992) , whose work with problem 
and defect counting at SEI, provides a general format as well as 
other items of interest in defect reporting. ANSI/AIAA (ANSI/AIAA, 
1992) , who have published a standard on practices for software 
reliability, contribute information on collecting discrepancy 
correction information. 

As noted earlier, this report format and the information it 
contains serves as a starting point for discrepancy data 
collection. Although not comprehensive, it does include enough 
information to support the management of a software quality 
program. 
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SOFTWARE COMPONENT DEFECT REPORT 



Page 1 



Date Observed: Defect Report No.: 

Organization's Name: 

Point of Contact: 



TYPE OF DEFECT 


SOFTWARE DEFECTS 


X 


OTHER DEFECTS 


X 


Requirements 




Hardware 




Code 




User Mistake 




Test Case 




Operating System 




Design 




Operations Mistake 




User Manual 









DEFECT FINDING ACTIVITY 


INTEGRATION OF: 


X 


FORMAL REVIEW OF: 


X 


Design 




Plans 




Code 




Requirements 




Test Procedure 




Preliminary Design 




User Publications 




Critical Design 




INSPECTIONS OF: 




Test Readiness 




Requirements 




Formal Qualification 




Preliminary Design 




TESTING 




Detailed Design 




Test Planning 




Code 




Module Testing 




Operational Document 




Component Testing 




Test Procedures 




Integration and Testing 




CUSTOMER SUPPORT 




Independent V & V 




Production/Deployment 




Testing and Evaluation 




Installation 




Acceptance Testing 




Operation 




System Error Message 
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SOFTWARE COMPONENT DEFECT REPORT 



Page 2 i 




DEFECT RELATED INFORMATION 


FINDING MODE 


X 


RESOLUTION OF DEFECT 


X 


Static (non -operational) 




Fixed 




Dynami c ( ope r a t i ona 1 ) 




Waived with Workaround 




Unknown 




Requirements Changed 




SEVERITY 




Not a Problem 




SEVERE (must be fixed) 




DEFECT RELATED TO A 
PREVIOUS CHANGE 




MAJOR (affects sofnvare performance) 




Yes (Date: ) 




MINOR (woilcaround available) 




No 




INSIGNIFICANT (not visible to user) 




Can't Tell 




TIME TO ISOLATE DEFECT 




CHANGE RECOMMENDATIONS 




1 Hour or Less 




Fault Correction 




1 Hour to 1 Day 




Design Correction 




More than 1 Day 




Clerical Correction 




Never Found 




Specification Correction 








Documentation Correction 
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SOFTWARE COMPONENT DEFECT REPORT 



Page 3 



DEFECT DETECTION/PREVENTION INFORMATION 



Identify which software lifecycle phase should have caught 
this defect and explain why it was not found. 



o 



Requirements Evaluation: 



Desi 



gn Inspection: 



o 



Code Inspection: 



o 



Development Testing: 



o 



Performance Testing: 



o 



Other: 
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SOFTWARE COMPONENT DEFECT REPORT 



COMPONENT DATA 


Software Size (in LOG) 




Source Language Used 





COMPONENT FAILURE DATA (provide at least one) 


CPU Hours Since Last Failure 




Wall Clock Hours since Last Failure 




Number of Runs or Tests Since Last Failure 




Test Hours per Test Interval 




Number of Failures in Test Interval (above) 




Test Labor Hours Since Last Failure 





DEFECT CORRECTION DATA 


Date and Time Correction Made 




Labor Hours to Make Correction 




Provide one of the following: 




CPU Hours to Fix Defect 




Number of Runs to Effect Fix 




Wall Clock Hours to Effect Fix 
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APPENDIX C 



AMPLIFICATION OF THE SOFTWARE METRICS VALIDATION METHODOLOGY 
A. INTRODUCTION 

A predictive metric is considered valid, if and only if it is 
proven to possess a high degree of correlation with the quality 
factor it replaces. Additionally, a predictive metric may prove to 
be valid with respect to only a subset of the six metrics 
validation criterion; correlation, tracking, consistency, 
predictability, discriminative power and reliability. 
(Schneidewind, 1992) 

Before starting this discussion, it is useful to establish an 
understanding of the rationale for metrics validation. 
Schneidewind points out that the purpose of software metrics 
validation is to prove the following: 

IF R[M] o R[F] THEN {R [M] o R [F] } {R[M'J => R[F']]7 (1) 



Consider a project Pj^ from which some metric (M) and quality factor 
(F)^ have been collected. This relation then suggests that if 
some relation (R) between F and M on Pj^ can be statistically 
validated with respect to some validity criteria, subject to a 
threshold value /3 and confidence level of oi, then the R in P^^ 
should hold true in another project ^ 2 ^’ Concisely stated, if M 
can be mapped to F on Pj^ and validated then M' should map to F' on 
P 2 . Hence, the essence of the validation process is to validate 
M with respect to one or more of the validity criterion using a 
threshold value and a confidence level of 01 such that (1) holds 
true (Schneidewind, 1992) . 

The following examples illustrate metrics validation with respect 
to each of the six validity criteria. These examples are drawn 
from the publishings of Schneidewind (Schneidewind, 1992) and IEEE 
(IEEE, 1992) . Terms in parenthesis indicate an alternate choice of 
syntax for a specific validity criteria. 

B. METRICS VALIDATION EXAMPLES 

1. ASSOCIATION (CORRELATION) 

Given R^ (where R is the linear correlation efficient for 



^F and M will be used hereon in place of the terms Quality 
Factor and Metric respectively. 

^The reader is directed to Figure 7.1 in Chapter VII for an 
illustration of this methodology. 
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F and M) which represents the variation in F due to variations in 
M and which represents some threshold value of R^, the 
association test for a metric specifies that R 2 > j3 must hold true 
a given confidence level a. This test seeks to show that 
sufficient linear correlation exits between F and M such that M can 
be used as a substitute for F when F is unknown. (Schneidewind, 
1992) 

For example, if R = 0.7 for metric number-of -statements 
and quality factor defect -report -count taken from a sample software 
component, then R^ = .49 which suggests that 49 percent of the 

variation in the number of defect reports is explained by the 
number of statements in this module. If /3 were chosen to be equal 
to 0.7, or greater, then metric number-of -statements would fail the 
validity test with respect to the association validity criteria. 

2 . TRACKING 

Metric validation with respect to tracking is used to 
determine the ability of a M to change in unison with F for a given 
component over a period of time. Schneidewind describes this 

relationship for some component with associated metric and 
quality factor F^ using the following notations: 

Mi(Ti) > M;^(T 2 ) ^ Fi'(Ti) > Fi(T 2 ) (where T 2 > T^) 

Mi(Ti) = Mi(T 2) « Fi(T^) = F^(T2) 

Mi(Ti) < Mi(T 2) Fi(Ti) < Fi(T2) (2) 



In essence, in order for a metric to be valid with respect to the 
tracking criteria, any change in F between times and must be 
accompanied with a proportional change in the same direction of M. 
If M can be proven to behave according to the properties outlined 
in (2) , it can then be used as an indirect measure of F when F is 
unknown. (Schneidewind, 1992) 

For example, consider project where number-of - 
statements and defect-report-count are given as = 4000 and F^ = 
40 at time and M^= 2000 and F]_ = 20 at time T 2 . From this 
information it appears that M changes proportionally and in the 
same direction as F. If this relation between F and M is proven to 
hold over a representative sample of software components, then 
number-of -statements can be considered suitable for tracking 
defect -report -count over the project's lifecycle. 



3. CONSISTENCY 



The consistency validity test proves that the rank 
ordering of a set of metrics associated with a set of projects 
correlates to the rank ordering of the quality factors associated 
with the same set of projects (Schneidewind, 1992) , 

For example, consider projects P]_, P 2 and P 3 where number- 
of -statements for each project is given as = 4000, M^= 2000 and 
= 1500 respectively. Here, the rank ordering based on a low 
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value being most desirable is: . In order for number- 
of -statements to be valid, the defect-report-count for the group of 
projects must demonstrate the same rank ordering (i.e., > 
F^) . 

Metrics are validated with respect to the consistency 
criteria by ensuring the rank correlation coefficient r between F 
and M exceeds a predefined threshold with a certain level of 
confidence. Specifically, the following must hold true: 
(Schneidewind, 1992) 

r > /^critical with a given (3) 



Consider the example where /Smticaj = 0.6, = .05 and for 
number-of -statements and defect-report-count, r = .7 with a = .05. 
Since r > /Smticai with an acceptable confidence level, if this 70% 
ranking for F and M is proven to exist over a representative sample 
of software components, then number-of -statements appears to be 
consistent with defect -report -count and can be used in ranking 
associated components in terms of quality. 



4 . PREDICTABILITY 



In order for M to satisfy the predictability criteria it 
must satisfy the following condition: 



*T2 ^Pt2 






*T2 



(4) 



In essence, for some function f (M) using metric M' collected at 
time Tl, f (M) must be able to predict the quality factor F^ at 
time T2 with an accuracy of (Schneidewind, 1992) 

Consider project where at Tj^ number-of -statements, 
given as M' = 5000, predicts defect -report -count F^ = 3 0 where 
project standards require = .20. In order for M' to be valid 
with respect to the predictability criteria, F^ must be less than 
50. If the application of M' using a representative sample of 
components shows that M' meets the requirements of (4) , then 
number-of -statements can be considered a suitable predictor of 
defect-report-count and can be applied in the context of software 
quality control. 

5. DISCRIMINATIVE POWER 

To meet the discriminative power test, Schneidewind 
points out that a critical metric value for a given critical 
quality factor value must be able to classify metric from 
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component with a specified a such that: 

<=> > Fj, and 

M;l s s F^ (5) 

In short, must be able to distinguish between high and low 

quality components with a given level of confidence. 
(Schneidewind, 1992) 

IEEE suggests the use of the Mann-Whitney Test and Chi- 
square test (contingency table) for this type of metric validation 
(IEEE, 1992) . Table C-1 will be used to illustrate the application 
of contingency table analysis to a project. 

TABLE C-1 CONTINGENCY TABLE 



o 

o 

o o 
It II 

o o 


Ml < 


Ml > 


Fl = -P’c 


Oil = ^ 


O12 = 0 


> Fc 


O21 = 1 


O22 = 4 



O^j = count of observations in cell i,j 



The values used in Table C-1 illustrate that for project 
Pj^, one component is observed to have passed the 
acceptable quality test (M^^ < M^) yet failed the qualify factor 
test > F^. While a perfect is difficult to find, the 
objective is to validate a metric with respect to the 
predictability criteria after (5) is proven to hold true over a 
representative sample of components. If this is the case then, M 
can serve as a discriminator of quality in various quality 
functions . 

6. REPEATABILITY (RELIABILITY) 

A metric passes the repeatability criteria if it 
demonstrates a given percentage rate of success when validated with 
respect to one or more of the validity criterion described above. 
Specifically, Schneidewind proposes the criteria that for some M, 
the following must hold true over a given set of validity criterion 
(Schneidewind, 1992) : 

Ni_ / Ni > (6) 



Here, represents the number of successful validations of M, N^ 

represents the total number of validity tests M is subjected to and 
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0 . represents a threshold evaluation criteria. Thus, this 
relation states that the percentage of successful validations of a 
metric with respect to a given set of criteria must exceed a 
certain critical value in order to provide confidence in that 
metric's use in software quality functions. 

C. CONCLUSIONS 

The intent of this appendix is to provide the reader with an 
example of metrics validation with respect to each of the six 
validity criterion. Further information is available in the 
writings of Schneidewind (Schneidewind, 1992) and IEEE (IEEE, 
1992) . In particular, Schneidewind provides a good discussion on 
the purpose and use of metrics. In his paper he provides an useful 
table (Appendix A) which correlates the quality functions 
assessment, control and prediction to the six validity criterion 
and gives examples of the statistical methods by which metrics can 
be validated. 
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APPENDIX D 



CHAPTER VII TABLES AND FIGURES 

TABLE D-1 SHUTTLE SOFTWARE METRICS AND QUALITY FACTOR 
(Schneidewind, 1993) 



Metric 


Metric Description 


etal 


unique operator count 


eta2 


unique operand count 


nl 


total operator count 


n2 


total operand count 


stmts 


total statement count 


loc 


total non- commented lines of code 


comments 


total comment count 


nodes 


total node count (in control graph) 


edges 


total edge count (in control graph) 


paths 


total path count (in control graph) 


cycles 


total cycle count (in control graph) 


maxpath 


maximum path length (edges in control graph) 


avepath 


average path length (edges in control graph) 


Quality 

Factor 


Quality Factor Description 


drcount 


discrepancy reports covering discrepancies (defects) 
between planned and actual requirements, design, and 
code as obtained form inspection of the documentation 
and test of the code 
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TABLE D-2 SMOOTHED METRICS DATA FOR 12 CLASSES 
(Schneidewind, 1993) 



Class 


stmts 


nodes 


dr count 


Range 


Ave 


S.D. 


Range 


Ave 


S.D. 


Range 


Ave 


S.D. 


T~ 


1-34 


8.97 


9.18 


3-288 


9.86 


20.89 


0-31 


.65 


2.17 


2 


35-68 


48 . 72 


10.14 


3-60 


21.82 


13.68 


0-13 


1.57 


2.33 


3 


69-103 


85.12 


10.62 


3-79 


35.35 


18 .91 


0-13 


2.08 


2.57 


4 


104-137 


119.58 


9.30 


5-119 


45.95 


28.10 


0-18 


2.79 


3 . 84 


5 • 


138-171 


156.55 


9.28 


5-147 


56 . 13 


39.34 


0-22 


4.00 


4.43 


6 


172-206 


189.70 


10.83 


5-167 


75 . 08 


44 . 02 


0-13 


3.95 


4.15 


7 


207-240 


222 . 77 


9.48 


5-156 


90.64 


40 . 14 


0-13 


4.91 


3.22 


8 


241-275 


254.37 


11.50 


5-166 


71.04 


61 . 04 


0-12 


3.95 


4.09 


i ^ 


276-309 


294.20 


10.56 


5-147 


95 . 67 


56 . 17 


0-34 


5 . 67 


8.25 


10 


310-343 


320.22 


10.47 


5-171 


65.33 


59 . 04 


0-10 


4.22 


3.90 




344-378 


357 . 86 


9 . 70 


5-338 


156.00 


81 .61 


1-37 


9.93 


9.3 6 < 


r-if- 


379-412 


397.88 


6.22 


5-232 


145.25 


94.17 


1-26 


10.38 


9.55 



TABLE D-3 DISCRIMINATIVE POWER VALIDITY EVALUATION 
(Schneidewind, 1993) 



Dc 


Sc 


Nc 


Pi 


P2 


^12 


I 


RI 


RFP 


RFD 


RMP 




0 


8 


- 


4.72 


27.3 


32.0 


63 . 8 


1 . 34 


9 . 89 


. 183 


4 . 72 


258 


0 


- 


9 


12 . 6 


17.8 


30.4 


46.4 


1.61 


28 . 6 


.528 


12 . 6 


208 


0 


8 


9 


2.79 


29.5 


32.3 


67.9 


1.31 


4 . 11 


.0759 


2.79 


286 


1 


48 


21 


7.23 


20.0 


27.2 


42.3 


1.11 


18.1 


.335 


12 . 0 


261 


2 


85 


35 


6.73 


16.8 


23.5 


31.3 


. 86 


27.8 


.513 


18.2 


237 



D^: Calculated critical value of drcount 

S^: Calculated critical value of stints 

N : Calculated critical value of nodes 

X^c- Calculated chi-square 
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3 30*3 Son 



0.52 

0.32 

0.12 



-0.08 

t 

2 

-0.28 



Figure D-1 Principle Components Weights (Schneidewind, 1993) 






• eta2 



. comments 



• drcount 



• etal 



. loc 



• paths 



• c>cles 



0.2 0.22 



0.24 0.26 0.28 

Component 1 



.edges 

.nodes 

avepath* .maxpath ■ 
0.3 0.32 




Figure D-2 Major Contributors to Components (Schneidewind, 19 93) 
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Defpcl 

Report 

Count 




Figure D-3 Defects vs. Statements (Schneidewind, 1993) 



Average Defect Count in Classes versus 
Average Statement Count in Classes 




Figure D-4 Average Defects vs. Average Statements 
(Schneidewind, 1993) 
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Average Defect Count in Classes versus 
Average Node Count in Classes 




fl\candcs 



Figure D-5 Average Defects vs. Average Nodes 
(Schneidewind, 1993) 



Quality: Remaining drco\mt and Modules 

with drcount > 0, after Inspection 




Percm 

Figure D-6 Quality vs. Inspection (Schneidewind, 1993) 
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Average Defect Report Count versus Average 
Statement Count and Average Node Count 




Figure D-7 Defects vs. Statements and Nodes (Schneidewind, 1993) 



avedrcount = exp (.242 + (.00523 * avestmts) ) 







Figure D-8 Avedrcount vs. Avestmts (Schneidewind, 1993) 



91 



Average Defect Count = -.262 + (.0658 * Average Node Count) 




Figure D-9 Average Defects vs. Average Nodes 
(Schneidewind, 1993) 



Avedrcount = exp (.348 + (.00194 * avestmts) + 
(.00826 * avenodes) ) 




Figure D-10 Actual and Predicted Avedrcount (Schneidewind, 1993) 
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